Burford, Clint, Steven Bird and Timothy Baldwin (to appear) Collective Document Classification with Implicit Inter-document Semantic Relationships, In Proceedings of *SEM 2015: The Fourth Joint Conference on Lexical and Computational Semantics, Denver, USA

نویسندگان

  • Clinton Burford
  • Steven Bird
  • Timothy Baldwin
چکیده

This paper addresses the question of how document classifiers can exploit implicit information about document similarity to improve document classifier accuracy. We infer document similarity using simple n-gram overlap, and demonstrate that this improves overall document classification performance over two datasets. As part of this, we find that collective classification based on simple iterative classifiers outperforms the more complex and computationally-intensive dual classifier approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collective Document Classification with Implicit Inter-document Semantic Relationships

This paper addresses the question of how document classifiers can exploit implicit information about document similarity to improve document classifier accuracy. We infer document similarity using simple n-gram overlap, and demonstrate that this improves overall document classification performance over two datasets. As part of this, we find that collective classification based on simple iterati...

متن کامل

On the Problem of Lexical Semantic Change

The article provides an insight into a problem of lexical semantic change. A short historical outline of the development of semantic studies is given. The authors analyze some of the most important stages in the history of the formation of this field. The existing approaches to dealing with form and meaning, namely semasiological and onomasiological ones are discussed. The authors consider the ...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, *SEM 2015, June 4-5, 2015, Denver, Colorado, USA

This paper proposes neural networks for inte-grating compositional and non-compositionalsentiment in the process of sentiment compo-sition, a type of semantic composition that op-timizes a sentiment objective. We enable in-dividual composition operations in a recursiveprocess to possess the capability of choosingand merging information from these two typesof sour...

متن کامل

Lexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs

The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015